
<!doctype html>
<html lang="en" class="no-js">
  <head>
    
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width,initial-scale=1">
      
      
        <meta name="author" content="Oliver Wang, Xiao-yang Liu">
      
      
      
      
        <link rel="next" href="jupyter/Data_Sources_News/">
      
      <link rel="icon" href="assets/images/favicon.png">
      <meta name="generator" content="mkdocs-1.4.2, mkdocs-material-9.1.6">
    
    
      
        <title>FinNLP</title>
      
    
    
      <link rel="stylesheet" href="assets/stylesheets/main.ded33207.min.css">
      
        
        <link rel="stylesheet" href="assets/stylesheets/palette.a0c5b2b5.min.css">
      
      

    
    
    
      
        
        
        <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
        <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,300i,400,400i,700,700i%7CRoboto+Mono:400,400i,700,700i&display=fallback">
        <style>:root{--md-text-font:"Roboto";--md-code-font:"Roboto Mono"}</style>
      
    
    
    <script>__md_scope=new URL(".",location),__md_hash=e=>[...e].reduce((e,_)=>(e<<5)-e+_.charCodeAt(0),0),__md_get=(e,_=localStorage,t=__md_scope)=>JSON.parse(_.getItem(t.pathname+"."+e)),__md_set=(e,_,t=localStorage,a=__md_scope)=>{try{t.setItem(a.pathname+"."+e,JSON.stringify(_))}catch(e){}}</script>
    
      

    
    
    
  </head>
  
  
    
    
    
    
    
    <body dir="ltr" data-md-color-scheme="default" data-md-color-primary="" data-md-color-accent="">
  
    
    
    <input class="md-toggle" data-md-toggle="drawer" type="checkbox" id="__drawer" autocomplete="off">
    <input class="md-toggle" data-md-toggle="search" type="checkbox" id="__search" autocomplete="off">
    <label class="md-overlay" for="__drawer"></label>
    <div data-md-component="skip">
      
        
        <a href="#internet-scale-financial-data" class="md-skip">
          Skip to content
        </a>
      
    </div>
    <div data-md-component="announce">
      
    </div>
    
    
      

  

<header class="md-header md-header--shadow" data-md-component="header">
  <nav class="md-header__inner md-grid" aria-label="Header">
    <a href="." title="FinNLP" class="md-header__button md-logo" aria-label="FinNLP" data-md-component="logo">
      
  
  <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M12 8a3 3 0 0 0 3-3 3 3 0 0 0-3-3 3 3 0 0 0-3 3 3 3 0 0 0 3 3m0 3.54C9.64 9.35 6.5 8 3 8v11c3.5 0 6.64 1.35 9 3.54 2.36-2.19 5.5-3.54 9-3.54V8c-3.5 0-6.64 1.35-9 3.54Z"/></svg>

    </a>
    <label class="md-header__button md-icon" for="__drawer">
      <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M3 6h18v2H3V6m0 5h18v2H3v-2m0 5h18v2H3v-2Z"/></svg>
    </label>
    <div class="md-header__title" data-md-component="header-title">
      <div class="md-header__ellipsis">
        <div class="md-header__topic">
          <span class="md-ellipsis">
            FinNLP
          </span>
        </div>
        <div class="md-header__topic" data-md-component="header-topic">
          <span class="md-ellipsis">
            
              Home
            
          </span>
        </div>
      </div>
    </div>
    
    
      <div class="md-header__option">
        <div class="md-select">
          
          <button class="md-header__button md-icon" aria-label="Select language">
            <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="m12.87 15.07-2.54-2.51.03-.03A17.52 17.52 0 0 0 14.07 6H17V4h-7V2H8v2H1v2h11.17C11.5 7.92 10.44 9.75 9 11.35 8.07 10.32 7.3 9.19 6.69 8h-2c.73 1.63 1.73 3.17 2.98 4.56l-5.09 5.02L4 19l5-5 3.11 3.11.76-2.04M18.5 10h-2L12 22h2l1.12-3h4.75L21 22h2l-4.5-12m-2.62 7 1.62-4.33L19.12 17h-3.24Z"/></svg>
          </button>
          <div class="md-select__inner">
            <ul class="md-select__list">
              
                <li class="md-select__item">
                  <a href="/" hreflang="en" class="md-select__link">
                    English
                  </a>
                </li>
              
                <li class="md-select__item">
                  <a href="/zh/" hreflang="zh" class="md-select__link">
                    中文
                  </a>
                </li>
              
            </ul>
          </div>
        </div>
      </div>
    
    
    
  </nav>
  
</header>
    
    <div class="md-container" data-md-component="container">
      
      
        
          
        
      
      <main class="md-main" data-md-component="main">
        <div class="md-main__inner md-grid">
          
            
              
              <div class="md-sidebar md-sidebar--primary" data-md-component="sidebar" data-md-type="navigation" >
                <div class="md-sidebar__scrollwrap">
                  <div class="md-sidebar__inner">
                    


<nav class="md-nav md-nav--primary" aria-label="Navigation" data-md-level="0">
  <label class="md-nav__title" for="__drawer">
    <a href="." title="FinNLP" class="md-nav__button md-logo" aria-label="FinNLP" data-md-component="logo">
      
  
  <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M12 8a3 3 0 0 0 3-3 3 3 0 0 0-3-3 3 3 0 0 0-3 3 3 3 0 0 0 3 3m0 3.54C9.64 9.35 6.5 8 3 8v11c3.5 0 6.64 1.35 9 3.54 2.36-2.19 5.5-3.54 9-3.54V8c-3.5 0-6.64 1.35-9 3.54Z"/></svg>

    </a>
    FinNLP
  </label>
  
  <ul class="md-nav__list" data-md-scrollfix>
    
      
      
      

  
  
    
  
  
    <li class="md-nav__item md-nav__item--active">
      
      <input class="md-nav__toggle md-toggle" type="checkbox" id="__toc">
      
      
        
      
      
        <label class="md-nav__link md-nav__link--active" for="__toc">
          Home
          <span class="md-nav__icon md-icon"></span>
        </label>
      
      <a href="." class="md-nav__link md-nav__link--active">
        Home
      </a>
      
        

<nav class="md-nav md-nav--secondary" aria-label="Table of contents">
  
  
  
    
  
  
    <label class="md-nav__title" for="__toc">
      <span class="md-nav__icon md-icon"></span>
      Table of contents
    </label>
    <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
      
        <li class="md-nav__item">
  <a href="#i-architecture" class="md-nav__link">
    Ⅰ. Architecture
  </a>
  
</li>
      
        <li class="md-nav__item">
  <a href="#ii-data-sources" class="md-nav__link">
    Ⅱ. Data Sources
  </a>
  
    <nav class="md-nav" aria-label="Ⅱ. Data Sources">
      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#1-news" class="md-nav__link">
    1. News
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#2-social-media" class="md-nav__link">
    2. Social Media
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#3-company-announcement" class="md-nav__link">
    3. Company Announcement
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#4-trends" class="md-nav__link">
    4. Trends
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#5-data-sets" class="md-nav__link">
    5. Data Sets
  </a>
  
</li>
        
      </ul>
    </nav>
  
</li>
      
        <li class="md-nav__item">
  <a href="#iii-models" class="md-nav__link">
    Ⅲ. Models
  </a>
  
    <nav class="md-nav" aria-label="Ⅲ. Models">
      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#1-fine-tuning-tensor-layers-lora" class="md-nav__link">
    1. Fine-tuning: Tensor Layers (LoRA)
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#2-fine-tuning-reinforcement-learning-on-stock-prices-rlsp" class="md-nav__link">
    2. Fine-tuning: Reinforcement Learning on Stock Prices (RLSP)
  </a>
  
</li>
        
      </ul>
    </nav>
  
</li>
      
        <li class="md-nav__item">
  <a href="#iv-applications" class="md-nav__link">
    Ⅳ. Applications
  </a>
  
    <nav class="md-nav" aria-label="Ⅳ. Applications">
      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#1-robo-advisor" class="md-nav__link">
    1. Robo Advisor
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#2-quantitative-trading" class="md-nav__link">
    2. Quantitative Trading
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#3-low-code-development" class="md-nav__link">
    3. Low-code development
  </a>
  
</li>
        
      </ul>
    </nav>
  
</li>
      
    </ul>
  
</nav>
      
    </li>
  

    
      
      
      

  
  
  
    <li class="md-nav__item">
      <a href="jupyter/Data_Sources_News/" class="md-nav__link">
        News
      </a>
    </li>
  

    
      
      
      

  
  
  
    <li class="md-nav__item">
      <a href="jupyter/Data_Sources_Social_Media/" class="md-nav__link">
        Social Media
      </a>
    </li>
  

    
      
      
      

  
  
  
    <li class="md-nav__item">
      <a href="jupyter/Data_Sources_Company_Announcement/" class="md-nav__link">
        Company Announcement
      </a>
    </li>
  

    
  </ul>
</nav>
                  </div>
                </div>
              </div>
            
            
              
              <div class="md-sidebar md-sidebar--secondary" data-md-component="sidebar" data-md-type="toc" >
                <div class="md-sidebar__scrollwrap">
                  <div class="md-sidebar__inner">
                    

<nav class="md-nav md-nav--secondary" aria-label="Table of contents">
  
  
  
    
  
  
    <label class="md-nav__title" for="__toc">
      <span class="md-nav__icon md-icon"></span>
      Table of contents
    </label>
    <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
      
        <li class="md-nav__item">
  <a href="#i-architecture" class="md-nav__link">
    Ⅰ. Architecture
  </a>
  
</li>
      
        <li class="md-nav__item">
  <a href="#ii-data-sources" class="md-nav__link">
    Ⅱ. Data Sources
  </a>
  
    <nav class="md-nav" aria-label="Ⅱ. Data Sources">
      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#1-news" class="md-nav__link">
    1. News
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#2-social-media" class="md-nav__link">
    2. Social Media
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#3-company-announcement" class="md-nav__link">
    3. Company Announcement
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#4-trends" class="md-nav__link">
    4. Trends
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#5-data-sets" class="md-nav__link">
    5. Data Sets
  </a>
  
</li>
        
      </ul>
    </nav>
  
</li>
      
        <li class="md-nav__item">
  <a href="#iii-models" class="md-nav__link">
    Ⅲ. Models
  </a>
  
    <nav class="md-nav" aria-label="Ⅲ. Models">
      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#1-fine-tuning-tensor-layers-lora" class="md-nav__link">
    1. Fine-tuning: Tensor Layers (LoRA)
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#2-fine-tuning-reinforcement-learning-on-stock-prices-rlsp" class="md-nav__link">
    2. Fine-tuning: Reinforcement Learning on Stock Prices (RLSP)
  </a>
  
</li>
        
      </ul>
    </nav>
  
</li>
      
        <li class="md-nav__item">
  <a href="#iv-applications" class="md-nav__link">
    Ⅳ. Applications
  </a>
  
    <nav class="md-nav" aria-label="Ⅳ. Applications">
      <ul class="md-nav__list">
        
          <li class="md-nav__item">
  <a href="#1-robo-advisor" class="md-nav__link">
    1. Robo Advisor
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#2-quantitative-trading" class="md-nav__link">
    2. Quantitative Trading
  </a>
  
</li>
        
          <li class="md-nav__item">
  <a href="#3-low-code-development" class="md-nav__link">
    3. Low-code development
  </a>
  
</li>
        
      </ul>
    </nav>
  
</li>
      
    </ul>
  
</nav>
                  </div>
                </div>
              </div>
            
          
          
            <div class="md-content" data-md-component="content">
              <article class="md-content__inner md-typeset">
                
                  


<h1 id="internet-scale-financial-data">Internet-scale Financial Data</h1>
<p>The demos are shown in <a href="https://github.com/AI4Finance-Foundation/ChatGPT-for-FinTech">FinGPT</a></p>
<p>中文版请点击<a href="zh/index_zh/">这里</a></p>
<p><strong>Disclaimer: We are sharing codes for academic purpose under the MIT education license. Nothing herein is financial advice, and NOT a recommendation to trade real money. Please use common sense and always first consult a professional before trading or investing.</strong></p>
<h2 id="i-architecture">Ⅰ. Architecture</h2>
<p><img alt="image-20230505200244043" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052002139.png" /></p>
<ul>
<li>
<p>The whole project is made up of 4 parts:</p>
</li>
<li>
<p>The first part is the <strong>Data Source</strong>, Here, we <strong>gather past and streaming data</strong> from the Internet.   </p>
</li>
<li>
<p>Next, we push the data to the <strong>Data Engineering</strong> part where we <strong>clean the data, tokenize the data and do the prompt engineering</strong></p>
</li>
<li>
<p>Then, the data is pushed to <strong>LLMs</strong>. Here, we may use LLMs in different kind of ways. We can not only use the collected data to train our own <strong>light-weight fine-tuning models</strong> but we can also use those data and <strong>trained models</strong> or <strong>LLM APIs</strong> to support our applications</p>
</li>
<li>The last part would be the <strong>application</strong> part, here we can use data and LLMs to make many interesting applications.</li>
</ul>
<h2 id="ii-data-sources">Ⅱ. Data Sources</h2>
<p><img alt="image-20230505200446477" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052004539.png" /></p>
<ul>
<li>Due to space limitations, we only show a few of them.</li>
</ul>
<h3 id="1-news">1. <a href="jupyter/Data_Sources_News/">News</a></h3>
<table>
<thead>
<tr>
<th align="center">Platform</th>
<th align="center">Data Type</th>
<th align="center">Related Market</th>
<th align="center">Specified Company</th>
<th align="center">Range Type</th>
<th align="center">Source Type</th>
<th align="center">Limits</th>
<th>Docs (1e4)</th>
<th>Support</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">Yahoo</td>
<td align="center">Financial News</td>
<td align="center">US Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>1,500+</td>
<td>√</td>
</tr>
<tr>
<td align="center">Reuters</td>
<td align="center">Financial News</td>
<td align="center">US Stocks</td>
<td align="center">×</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>1,500+</td>
<td>√</td>
</tr>
<tr>
<td align="center">Sina</td>
<td align="center">Financial News</td>
<td align="center">CN Stocks</td>
<td align="center">×</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>2,000+</td>
<td>√</td>
</tr>
<tr>
<td align="center">Eastmoney</td>
<td align="center">Financial News</td>
<td align="center">CN Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>1,000+</td>
<td>√</td>
</tr>
<tr>
<td align="center">Yicai</td>
<td align="center">Financial News</td>
<td align="center">CN Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>500+</td>
<td>Soon</td>
</tr>
<tr>
<td align="center">CCTV</td>
<td align="center">Governemnt News</td>
<td align="center">CN Stocks</td>
<td align="center">×</td>
<td align="center">Date Range</td>
<td align="center">Third party</td>
<td align="center">N/A</td>
<td>4</td>
<td>√</td>
</tr>
<tr>
<td align="center">US Mainstream</td>
<td align="center">Financial News</td>
<td align="center">US Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Third party</td>
<td align="center">Account (Free)</td>
<td>3,200+</td>
<td>√</td>
</tr>
<tr>
<td align="center">CN Mainstream</td>
<td align="center">Financial News</td>
<td align="center">CN Stocks</td>
<td align="center">×</td>
<td align="center">Date Range</td>
<td align="center">Third party</td>
<td align="center">￥500/year</td>
<td>3000+</td>
<td>√</td>
</tr>
</tbody>
</table>
<ul>
<li>FinGPT may have <strong>fewer docs</strong> than Bloomberg, we're on the <strong>same order of magnitude.</strong></li>
</ul>
<h3 id="2-social-media">2. <a href="jupyter/Data_Sources_Social_Media.iypnb">Social Media</a></h3>
<table>
<thead>
<tr>
<th align="center">Platform</th>
<th align="center">Data Type</th>
<th align="center">Related Market</th>
<th align="center">Specified Company</th>
<th align="center">Range Type</th>
<th align="center">Source Type</th>
<th align="center">Limits</th>
<th>Docs (1e4)</th>
<th align="center">Support</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">Twitter</td>
<td align="center">Tweets</td>
<td align="center">US Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>18,000+</td>
<td align="center">√</td>
</tr>
<tr>
<td align="center">StockTwits</td>
<td align="center">Tweets</td>
<td align="center">US Stocks</td>
<td align="center">√</td>
<td align="center">Lastest</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>160,000+</td>
<td align="center">√</td>
</tr>
<tr>
<td align="center">Reddit (wallstreetbets)</td>
<td align="center">Threads</td>
<td align="center">US Stocks</td>
<td align="center">×</td>
<td align="center">Lastest</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>9+</td>
<td align="center">√</td>
</tr>
<tr>
<td align="center">Weibo</td>
<td align="center">Tweets</td>
<td align="center">CN Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">Cookies</td>
<td>1,400,000+</td>
<td align="center">√</td>
</tr>
<tr>
<td align="center">Weibo</td>
<td align="center">Tweets</td>
<td align="center">CN Stocks</td>
<td align="center">√</td>
<td align="center">Lastest</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>1,400,000+</td>
<td align="center">√</td>
</tr>
</tbody>
</table>
<ul>
<li>In <strong>BloomberGPT</strong>, they <strong>don’t collect social media data</strong>, but we believe that <strong>public opinion is one of the most important factors interfering the stock market.</strong></li>
</ul>
<h3 id="3-company-announcement">3. <a href="jupyter/Data_Sources_Company_Announcement/">Company Announcement</a></h3>
<table>
<thead>
<tr>
<th align="center">Platform</th>
<th align="center">Data Type</th>
<th align="center">Related Market</th>
<th align="center">Specified Company</th>
<th align="center">Range Type</th>
<th align="center">Source Type</th>
<th align="center">Limits</th>
<th>Docs (1e4)</th>
<th align="center">Support</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">Juchao (Official Website)</td>
<td align="center">Text</td>
<td align="center">CN Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>2,790+</td>
<td align="center">√</td>
</tr>
<tr>
<td align="center">SEC (Official Website)</td>
<td align="center">Text</td>
<td align="center">US Stocks</td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
<td>1,440+</td>
<td align="center">√</td>
</tr>
</tbody>
</table>
<ul>
<li>Since we collect data from different stock markets, we have <strong>more filing docs</strong> than Bloomberg GPT.</li>
</ul>
<h3 id="4-trends">4. Trends</h3>
<table>
<thead>
<tr>
<th align="center">Platform</th>
<th align="center">Data Type</th>
<th align="center">Related Market</th>
<th align="center">Data Source</th>
<th align="center">Specified Company</th>
<th align="center">Range Type</th>
<th align="center">Source Type</th>
<th align="center">Limits</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center"><a href="https://trends.google.com/trends/explore">Google Trends</a></td>
<td align="center">Index</td>
<td align="center">US Stocks</td>
<td align="center"><a href="./finnlp/data_sources/trends/google.py">Google Trends</a></td>
<td align="center">√</td>
<td align="center">Date Range</td>
<td align="center">Official</td>
<td align="center">N/A</td>
</tr>
<tr>
<td align="center"><a href="https://index.baidu.com/v2/index.html#/">Baidu Index</a></td>
<td align="center">Index</td>
<td align="center">CN Stocks</td>
<td align="center">Soon</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
<h3 id="5-data-sets">5. Data Sets</h3>
<table>
<thead>
<tr>
<th align="center">Data Source</th>
<th align="center">Type</th>
<th align="center">Stocks</th>
<th align="center">Dates</th>
<th align="center">Avaliable</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center"><a href="https://github.com/JinanZou/Astock">AShare</a></td>
<td align="center">News</td>
<td align="center">3680</td>
<td align="center">2018-07-01 to 2021-11-30</td>
<td align="center">√</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/yumoxu/stocknet-dataset">stocknet-dataset</a></td>
<td align="center">Tweets</td>
<td align="center">87</td>
<td align="center">2014-01-02 to 2015-12-30</td>
<td align="center">√</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/wuhuizhe/CHRNN">CHRNN</a></td>
<td align="center">Tweets</td>
<td align="center">38</td>
<td align="center">2017-01-03 to 2017-12-28</td>
<td align="center">√</td>
</tr>
</tbody>
</table>
<h2 id="iii-models">Ⅲ. Models</h2>
<p><img alt="image-20230505200618504" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052006541.png" /></p>
<ul>
<li>In data-centric NLP, we don’t train the model from the beginning. We only <strong>call APIs</strong> and <strong>do light-weight fine-tunings.</strong></li>
<li>The left part is some LLM APIs that we may use and the middle part is the models that we may use to perform fine-tunings and the right part is some of the <strong>Fine-tuning methods</strong></li>
</ul>
<h3 id="1-fine-tuning-tensor-layers-lora">1. Fine-tuning: Tensor Layers (LoRA)</h3>
<p><img alt="image-20230505200944411" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052009480.png" /></p>
<ul>
<li>In FinGPT, we fine-tune a pre-trained LLM using a new financial dataset.<strong>High-quality labeled data</strong> is one of the most <strong>important key</strong> to many successful LLMs including ChatGPT</li>
<li>However, those high-quality labeled data are often very <strong>expensive and time-consuming</strong> and we may need help from professional finance experts.</li>
<li>If our goal is to use LLMs to analyze financial-related text data and help with quantitative trading, why not <strong>let the market do the labeling</strong> for us?</li>
<li>So here, we use the related stock price change percent of each news as the output label, we use the threshold to split the label into three groups <strong>positive, negative, and neutral,</strong> and use them and the <strong>label of the news sentiment</strong>.</li>
<li>In correspondence, we also ask the model to select one of positive, negative, and neutral as the output in the <strong>prompt engineer</strong> part so we the make the best use of the pre-trained information</li>
<li>By using LoRA we may reduced the trainable parameters <strong>from 6.17B to 3.67M</strong></li>
<li>As the table presents, compared with chatGLM, FinGPT can achieve large improvement on multiple metrics. it may be <strong>inappropriate</strong> to <strong>use our model to quantitative trading directly.</strong> Since most <strong>news titles are neutral</strong>, most of the <strong>original outputs of the LLMs are Neutral</strong>, so LLM <strong>perform poorly in positive and negative labels</strong> and <strong>those</strong> <strong>labels</strong> are what might be <strong>useful in quantitative trading.</strong></li>
<li>However, <strong>after fine-tuning</strong>, we have witness <strong>huge improvements in the prediction of</strong> <strong>positive and negative labels.</strong> </li>
<li>That’s also <strong>why the model can achieve positive trading results</strong>.</li>
</ul>
<h3 id="2-fine-tuning-reinforcement-learning-on-stock-prices-rlsp">2. Fine-tuning: Reinforcement Learning on Stock Prices (RLSP)</h3>
<p><img alt="image-20230505201209946" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052012996.png" /></p>
<ul>
<li>In the same way, we may use RL on Stock Prices (RLSP) to replace RL on Human feedback used by ChatGPT.</li>
</ul>
<h2 id="iv-applications">Ⅳ. Applications</h2>
<h3 id="1-robo-advisor">1. Robo Advisor</h3>
<p><img alt="image-20230505201913233" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052019296.png" /></p>
<ul>
<li><strong>ChatGPT can make the investment advises just like a pro</strong>.</li>
<li>In this example the <strong>raising stock price</strong> of the Apple is <strong>in accordance with</strong> ChatGPT’s <strong>prediction made by the analysis of news</strong></li>
</ul>
<h3 id="2-quantitative-trading">2. Quantitative Trading</h3>
<p><img alt="image-20230505201841001" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052018035.png" /></p>
<ul>
<li>We may also use News, Social media tweet or filing to <strong>build sentiment factors</strong>, the right part is the trading results just by the signal of the twitter tweets and ChatGPT, the data is from a data set called <a href="https://link.zhihu.com/?target=https%3A//github.com/yumoxu/stocknet-dataset">stocknet-dataset</a>.</li>
<li>As you may see from the picture, the trading signals generated by ChatGPT are <strong>so good</strong> that we may <strong>even achieve good results just by trading according to twitter sentiment factors.</strong></li>
<li>So we may even <strong>achieve better results by combining price factors</strong>.</li>
</ul>
<h3 id="3-low-code-development">3. Low-code development</h3>
<p><img alt="image-20230505202028292" src="https://cdn.jsdelivr.net/gh/oliverwang15/imgbed@main/img/202305052020363.png" /></p>
<ul>
<li>We can use the help of LLMs to write codes.</li>
<li>The right part shows how we can develop our factors and other codes <strong>quickly and efficiently.</strong></li>
</ul>





                
              </article>
            </div>
          
          
        </div>
        
      </main>
      
        <footer class="md-footer">
  
  <div class="md-footer-meta md-typeset">
    <div class="md-footer-meta__inner md-grid">
      <div class="md-copyright">
  
  
    Made with
    <a href="https://squidfunk.github.io/mkdocs-material/" target="_blank" rel="noopener">
      Material for MkDocs
    </a>
  
</div>
      
    </div>
  </div>
</footer>
      
    </div>
    <div class="md-dialog" data-md-component="dialog">
      <div class="md-dialog__inner md-typeset"></div>
    </div>
    
    <script id="__config" type="application/json">{"base": ".", "features": [], "search": "assets/javascripts/workers/search.208ed371.min.js", "translations": {"clipboard.copied": "Copied to clipboard", "clipboard.copy": "Copy to clipboard", "search.result.more.one": "1 more on this page", "search.result.more.other": "# more on this page", "search.result.none": "No matching documents", "search.result.one": "1 matching document", "search.result.other": "# matching documents", "search.result.placeholder": "Type to start searching", "search.result.term.missing": "Missing", "select.version": "Select version"}}</script>
    
    
      <script src="assets/javascripts/bundle.51198bba.min.js"></script>
      
    
  </body>
</html>